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Abstract —In real clustering applications, proximity data, in 
which only pairwise similarities or dissimilarities are known, 
is more general than object data, in which each pattern is 
described explicitly by a list of attributes. Medoid-based clus¬ 
tering algorithms, which assume the prototypes of classes are 
objects, are of great value for partitioning relational data sets. 
In this paper a new prototype-based clustering method, named 
Evidential C -Medoids (ECMdd), which is an extension of Fuzzy 
C-Medoids (FCMdd) on the theoretical framework of belief 
functions is proposed. In ECMdd, medoids are utilized as the 
prototypes to represent the detected classes, including specific 
classes and imprecise classes. Specific classes are for the data 
which are distinctly far from the prototypes of other classes, 
while imprecise classes accept the objects that may be cfose 
to the prototypes of more than one class. This soft decision 
mechanism could make the clustering results more cautious and 
reduce the misclassification rates. Experiments in synthetic and 
real data sets are used to illustrate the performance of ECMdd. 
The results show that ECMdd could capture well the uncertainty 
in the internal data structure. Moreover, it is more robust to the 
initializations compared with FCMdd. 

Index Terms —Credal partitions; Relational clustering; Eviden¬ 
tial c-medoids; Imprecise classes. 

I. Introduction 

Clustering is a useful technique to detect the underlying 
cluster structure of the data set. The goal of clustering is 
to partition a set of objects X = {xi,X 2 ,--- ,x n } into 
c small subgroups fl = {cui, W 2 , • • • ,oj c } based on a well 
defined measure of similarities between patterns. To measure 
the similarities (or dissimilarities), the objects are described by 
either object data or relational data. Object data are described 
explicitly by a feature vector, while relational data arise 
from the pairwise similarities or dissimilarities. Among the 
existing approaches to clustering, the objective function-driven 
or prototype-based clustering such as C-Means (CM) and 
Fuzzy (7-Means (FCM) is one of the most widely applied 
paradigms in statistical pattern recognition. These methods are 
based on a fundamentally very simple, but nevertheless very 
effective idea, namely to describe the data under consideration 
by a set of prototypes. They capture the characteristics of the 
data distribution (like location, size, and shape), and classify 
the data set based on the similarities (or dissimilarities) of the 
objects to their prototypes. 

The above mentioned clustering algorithms, CM and FCM 
are for object data. The prototype of each class in these 
methods is the center of gravity of all the included patterns. 
But for relational data set, it is difficult to determine the 
centers of objects. In this case, one of the objects which is 
most similar to the center could be the most rational choice 


to be setting as the prototype. This is the idea of clustering 
using medoids. Some clustering methods, such as Partitioning 
Around Medoids (PAM) QJ and Fuzzy (7-Medoids (FCMdd) 
0, produce hard and soft clusters where each of them is 
represented by a representative object (medoid). 

Belief functions have already been applied in many fields, 
such as data classification |[3), data clustering |4j, [5j, social 
network analysis Q and statistical estimation 0. 
Evidential (7-means (ECM) j4] is a newly proposed clustering 
method to get credal partitions for object data. The credal par¬ 
tition is a general extension of the crisp (hard) and fuzzy ones 
and it allows the object to belong to not only single clusters, 
but also any subsets of the set of clusters f1 = {wi, • • • ,w c } 
by allocating a mass of belief for each object in X over the 
power set 2 n . The additional flexibility brought by the power 
set provides more refined partitioning results than those by the 
other techniques allowing us to gain a deeper insight into the 
data j4j. In this paper, we introduce an extension of FCMdd 
on the framework of belief functions. The evidential clustering 
algorithm for relational data sets, named ECMdd, using a 
medoid which is assumed to belong to the original data set to 
represent a class are proposed to produce the optimal credal 
partition. The experimental results show the effectiveness of 
the methods and illustrate the advantages of credal partitions. 

The rest of this paper is organized as follows. In Section 
II, some basic knowledge and the rationale of our method 
are briefly introduced. In Section III the proposed ECMdd 
clustering approach is presented in detail. In Section IV we 
test ECMdd using various data sets and compare it with several 
other classical methods. Finally, we conclude and present some 
perspectives in Section V. 

II. Background 
A. Theory of belief functions 

Let fl = {cui, u> 2 , • • •, w c } be the finite domain of A', called 
the discernment frame. The belief functions are defined on the 
power set 2° = {A : A C fl }. 

The function m : 2 n —> [0,1] is said to be the Basic Belief 
Assignment (bba) on 2 n , if it satisfies: 

Y m(A) = 1. (1) 

a cn 

Every A £ 2 n such that m(A) > 0 is called a focal element. 
The credibility and plausibility functions are defined as in 
Eq. (|2]» and Eq. <|3). 

Bel(A) = Y ™(5) VA ^ Q, (2) 

BCA.B/0 


(3) 


Pl(A) = y m ( 5 ), yA ^ n - 

BnA^H> 


Each quantity Bel(A) measures the total support given to A, 
while PI (A) represents potential amount of support to A. 

A belief function on the credal level can be transformed 
into a probability function by Smets method G3- In this 
algorithm, each mass of belief m(A) is equally distributed 
among the elements of A. This leads to the concept of pignistic 
probability, BetP, defined by 


BetP{uii) 


E 

uji £ 


m(A ) 

|A|(l-m(0))’ 


(4) 


where |A| is the number of elements of O in A. 


B. Evidential c-means 

Evidential c-means |4j is a direct generalization of FCM 
in the framework of belief functions based on the concept 
of credal partitions. The credal partition takes advantage of 
imprecise (meta) classes to express partial knowledge of 
class memberships. In ECM, the evidential membership of an 
object Xi is represented by a bba Wj = (nrii (A k ) : A k C fl) 
(i = 1,2,--- , n) over the given frame of discernment Q. The 
set {Afc | Ak C f2, k = 1, 2, • • • , 2 C } contains all the focal ele¬ 
ments. The optimal credal partition is obtained by minimizing 
the following objective function: 


•Tecm = E E I A k \ a mi(A k yd 2 ik + Y s2m i 

i=1 A k Gn,A k7 t0 i =r 

constrained on 


(5) 


E m i(Ak) + TOi(0) = 1, (6) 

A k cn,Ak+0 

and 

mi(A k )>0 , TOi(0)> 0, (7) 

where mi{A k ) — rriik is the bba of x, given to the nonempty 
set A k , while m.j(0) = m$ is the bba of Xi assigned to 
the empty set. Parameter ct is a tuning parameter allowing 
to control the degree of penalization for subsets with high 
cardinality, parameter [3 is a weighting exponent and 5 is 
an adjustable threshold for detecting the outliers. Here dik 
denotes the distance (generally Euclidean distance) between Xi 
and the barycenter (i.e. prototype, denoted by v k ) associated 
with Ak'. 

d ik = \\ x i ~v k \\ 2 , (8) 


where v k is defined mathematically by 
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(9) 


The notation Vh is the geometrical center of points in cluster 
h. The update process with Euclidean distance is given by the 
following two alternating steps. 


• Assignment update, Vi, \/k/A k C f2, 0: 


'ITlik — 
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( 10 ) 


and for A k = 0 

= 1 - Y Vi = 1,2, ,n. (11) 

A k ^d> 

• Prototype update: The prototypes (centers) of the classes 
are given by the rows of the matrix v cxp , which is the 
solution of the following linear system: 

HV = B, (12) 

where H is a matrix of size (c x c) given by 

Hik = Y E w 2 "4 (i3) 

* Ak=?.{uJk } 

and B is a matrix of size (exp) defined by 

n 

Bi q = Y x * E I(14) 

i=t A k 3u>i 

C. Fuzzy c-medoids 

Fuzzy C'-Medoids (FCMdd) is a variation of classical 
c-means clustering designed for relational data 0- Let 
X = {xi | i = 1, 2, • ■ • , n} be the set of n objects and 
r(xi, Xj) = Tij denote the dissimilarity between objects Xi and 
Xj. Each object may or may not be represented by a feature 
vector. Let V = {vi,V 2 , • • • ,Vc\, Vi £ X represent a subset 
of X. The objective function of FCMdd is given as 


n c 

•/FCMdd = EE^^) 

i=r j=i 


subject to 

C 

Y u ij = 1 A = 1,2,-' ,n, 

j= i 

and 

Uij > 0,i = 1,2,••• ,n, j = 1,2,--- 


,c. 


(15) 

(16) 

(17) 


In fact, the objective function of FCMdd is similar to that of 
FCM. The main difference lies in that the prototype of a class 
in FCMdd is defined as the medoid, i.e., one of the object in 
the original data set, instead of the centroid (the average point 
in a continues space) for FCM. FCMdd is preformed by the 
following alternating update steps: 

• Assignment update: 


iiij 


- 1 / 03 - 1 ) 
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(18) 


Prototype update: the new prototype of cluster j is set to 
be vj = Xi- with 


x l* = arg mxa Y u ij T ( x A v j)- (I 9 ) 

{v j :v j =xi(EX)} J 

l— 1 






III. Evidential c-medoids clustering 

Here we introduce evidential c-medoids clustering algorithm 
using medoids in order to take advantages of both medoid- 
based clustering and credal partitions. This partitioning ev¬ 
idential clustering algorithm is mainly related to fuzzy c- 
medoids. Like all the prototype-based clustering methods, for 
ECMdd, an objective function should first be found to provide 
an immediate measure of the quality of partitions. Hence our 
goal can be characterized as the optimization of the objective 
function to get the best credal partition. 


A. The objective function 

As before, let X = {xi \ i = 1, 2, • • • , n} be the set of n 
objects and r(xi,Xj) = Tij denote the dissimilarity between 
objects Xi and x 3 . The pairwise dissimilarity is the only 
information required for the analyzed data set. The objective 
function of ECMdd is similar to that in ECM: 


■/ECMdd {M, V) 


constrained on 


X X i ■ a j i 1 “ m ij +X 52 > 


i =t 
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( 20 ) 


X m u 

AjCf2,Aj^0 
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( 21 ) 


where rn t j = rrii(Aj) is the bba of Xi given to the nonempty 
set Aj, mi® = mj(()) is the bba of Xi assigned to the empty 
set, and d, 3 = d(xi,Aj) is the dissimilarity between Xi and 
focal set Aj. Parameters a,/3,S are adjustable with the same 
meanings as those in ECM. Note that JECMdd depends on the 
credal partition M and the set V of all prototypes. 

Let v jP be the prototype of specific cluster (whose focal 
element is a singleton) Aj = {wjt} (k = 1, 2, • • • , c) and as¬ 
sume that it must be one of the objects in X. The dissimilarity 
between object Xi and cluster (focal set) Aj can be defined 
as follows. If \Aj\ = 1, i.e., Aj is associated with one of the 
singleton clusters in O (suppose to be 07 with prototype vj}, 
i.e., Aj = {wfc}), then the dissimilarity between x t and Aj is 
defined by 

dij = d{xi,Aj) = T(xi,v%). ( 22 ) 

When \Aj\ > 1, it represents an imprecise (meta) cluster. 
If object Xi is to be partitioned into a meta cluster, two 
conditions should be satisfied 0 - One condition is the dissim¬ 
ilarity values between Xi and the included singleton classes’ 
prototypes are small. The other condition is the object should 
be close to the prototypes of all these specific clusters. The 
former measures the degree of uncertainty, while the latter 
is to avoid the pitfall of partitioning two data objects irrele¬ 
vant to any included specific clusters into the corresponding 
imprecise classes. Therefore, the medoid (prototype) of an 
imprecise class Aj could be set to be one of the objects 
locating with similar dissimilarities to all the prototypes of 
the specific classes 07 € Aj included in Aj. The variance 
of the dissimilarities of object Xi to the medoids of all the 
included specific classes of Aj could be taken into account to 


express the degree of uncertainty. The smaller the variance is, 
the higher uncertainty we have for object Xi. Meanwhile the 
medoid should be close to all the prototypes of the specific 
classes. This is to distinguish the outliers, which may have 
equal dissimilarities to the prototypes of some specific classes, 
but obviously not a good choice for representing the associated 
imprecise classes. Let vj denote the medoid of class A.jjJ 
Based on the above analysis, the medoid of Aj should set to 
Vj = x p with 

P = arg min { / ({r(xj, v%); uj k € Aj}) 

%:Xi€iX 

+7? TTT X ( 23 > 

1 j 'ui k eA 3 


where 07 is the element of Aj, Vj} is its corresponding 
prototype and / denotes the function describing the variance 
among the corresponding dissimilarity values. The variance 
function could be used directly: 


Var « = pn £ 

1 3 1 ui k GAi 
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r(xi,v%) 


(24) 

In this paper, we use the following function to describe the 
variance pij of the dissimilarities between object x, and the 
medoids of the involved specific classes in Aj : 


Pij — 


1 

choose)| A,j, 2) 


X \l ( T ( X ^ V S) 
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T (Xi,V%)) 2 , 

(25) 


where choose(a, b) is the number of combinations of the given 
a elements taken 6 at a time. 

The dissimilarity between objects x, and class Aj can be 
defined as 


dij — 


r(xi,vf)+ 7|^j t{x i,v%) 

Aj 


(26) 


As we can see from the above equation, the dissimilarity 
between object Xi and meta class Aj (| Aj\ > 1 ) is the 
weighted average of dissimilarities of x t to the all involved 
singleton cluster medoids and to the prototype of the imprecise 
class Aj with a tuning factor 7. If Aj is a specific class with 
Aj = {uifc} (| Aj ( = 1 ), the dissimilarity between Xj and Aj 
degrades to the dissimilarity between x, and vj: as defined 
in Eq. |22|, i.e., v? = v jP. And if Aj \ > 1, its medoid is 
decided by Eq. ( [23| l. 

It is remarkable that although ECMdd is similar to Median 
Evidential C-Means (MECM) 0 algorithm in principle, but 
they are very different in dealing with the imprecise classes 
and the way of calculating the dissimilarities between objects 
and imprecise classes. Although both MECM and ECMdd 


'The notation v}} denotes the prototype of specific class thus it is 
in the framework of O. Similarly, if is defined on the power set 2", 
representing the prototype of the focal set Aj G 2 n . It is easy to see 
{v% = fc = 1 , 2 , ■ • • , c} C {vf :j = 1,2,--- , 2 C - 1}. 







consider the dissimilarities of objects to the prototypes for 
specific clusters, the strategy adopted by ECMdd is more 
simple and intuitive. Moreover, there is no representative 
medoid for imprecise classes in MECM. 

B. The optimization 

To minimize JecmcM, an optimization scheme via an 
Expectation-Maximization (EM) algorithm can be designed, 
and the alternate update steps are as follows: 

Step 1. Credal partition (M) update. 

The bbas of objects’ class membership for any subset 
Aj C Cl and the empty set 0 representing the outliers are 
updated identically to ECM |4): 

. \/Aj c n,Aj^(/), 

\ A \- a /(/3-l) d -l/(P-l) 

ro i7 - = -—-^- (27) 

E |A fc |-«/(/3~l)d- 1/(/3 - C. * * * * * 1) +<5-V(/3-l) 

Ak J0 

. If Aj = 0, 

m i 0 = 1 - ^2 rriij (28) 

Ajj4 

Step 2. Prototype (V) update. 

The prototype up of a specific (singleton) cluster 
u>i (i = 1,2,--- , c) can be updated first and then the 
prototypes of imprecise (meta) classes could be determined 
by Eq. ( [23] ). For singleton clusters oj k (At = 1,2,--- , c), the 
corresponding new prototype up (/.: = 1,2, , c) could be 

set to xi £ X such that 

xi = argmin ^ ^ m^dij(v' k ) : v' k £ X \ . (29) 

Vk J 

The dissimilarity between object x, and cluster Aj, dij, is a 
function of v k , which is the potential prototype of class Uk- 

The bbas of the objects’ class assignment are updated 
identically to ECM |4J, but it is worth noting that dij has 
different meanings as that in ECM although in both cases it 
measures the dissimilarity between object Xi and class Aj. In 
ECM dij is the distance between object i and the centroid 
point of Aj, while in ECMdd, it is the dissimilarity between 
Xi and the most “possible” medoid. For the prototype updating 
process the fact that the prototypes are assumed to be one of 
the data objects is taken into consideration. Therefore, when 
the credal partition matrix M is fixed, the new prototype 
of each cluster can be obtained in a simpler manner than 
in the case of ECM application. The ECMdd algorithm is 
summarized as Algorithm [T] 

We discuss here about the convergence of ECMdd. The 
assignment update process will not increase JECMdd since 
the new mass matrix is determined by differentiating of the 
respective Lagrangian of the cost function with respect to M. 
Also JECMdd will not increase through the medoid-searching 
scheme for prototypes of specific classes. If the prototypes of 
specific classes are fixed, the medoids of imprecise classes 
determined by Eq. (|23| are likely to locate near to the 


“centroid” of all the prototypes of the included specific classes. 
If the objects are in Euclidean space, the medoids of imprecise 
classes are near to the centroids found in ECM. Thus it will 
not increase the value of the objective function also. Moreover, 
the bba M is a function of the prototypes V and for given 
V the assignment M is unique. Because ECMdd assumes 
that the prototypes are original object data in X, so there is 
a finite number of different prototype vectors V and so is the 
number of corresponding credal partitions M. Consequently 
we can conclude that the ECMdd algorithm converges in a 
finite number of steps. 


Algorithm 1 : ECMdd algorithm 

Input: Dissimilarity matrix [r(xj, Xj)] nxn for the n objects 
{x!,x 2 ,--- ,x n }. 

Parameters: 

c: number clusters 1 < c < n 
a: weighing exponent for cardinality 
/3 > 1: weighting exponent 

S > 0: dissimilarity between any object to the empty set 
rj > 0: to distinguish the outliers from the possible medoids 
7 € [0,1]: balance of the contribution for imprecise classes 

Initialization: 

Choose randomly c initial prototypes from the object set 

repeat 

( 1 ) . t<-t+l 

(2) . Compute M t using Eq. ([27]), Eq. ( |28| ) and V t -i 

(3) . Compute the new prototype set V t using Eq. ( |29| ) 
and ( f23[ 

until the prototypes remain unchanged. 

Output: The optimal credal partition. 


C. The parameters of the algorithm 

As in ECM, before running ECMdd, the values of the 
parameters have to be set. Parameters a, /? and 5 have the 

same meanings as those in ECM. The value f3 can be set to 
be f3 = 2 in all experiments for which it is a usual choice. The 
parameter a aims to penalize the subsets with high cardinality 

and control the amount of points assigned to imprecise clusters 

for credal partitions. The higher a is, the less mass belief is 
assigned to the meta clusters and the less imprecise will be 
the resulting partition. However, the decrease of imprecision 
may result in high risk of errors. For instance, in the case of 
hard partitions, the clustering results are completely precise 
but there is much more intendancy to partition an object to 

an unrelated group. As suggested in {4j, a value can be used 
as a starting default one but it can be modified according to 
what is expected from the user. The choice 5 is more difficult 

and is strongly data dependent |4j. In ECMdd, parameter 
7 weighs the contribution of uncertainty to the dissimilarity 
between objects and imprecise clusters. Parameter rj is used 
to distinguish the outliers from the possible medoids when 
determining the prototypes of meta classes. It could be set 1 
by default and it has little effect on the final partition results. 







For determining the number of clusters, the validity index 
of a credal partition defined by [4j could be utilised: 


N*(c) = 


’■log 2 (c) 


E 


E m i(^)log 2 |4| 


i= 1 LAG 2 f! \0 

+ TOi(0)log 2 (c) 


(30) 


where 0 < N*(c) < 1. This index has to be minimized to get 
the optimal number of clusters. 


IV. Experiments 


In this section some experiments on various data sets 
will be performed to show the effectiveness of ECMdd. The 
results are compared with FCMdd and MECM to illustrate the 
effectiveness and merits of the proposed method. 

The c-means type clustering algorithms are sensitive to the 
initial prototypes. In this work, we follow the initialization 
procedure as the one used in [2) and (3 to generate a set 
of c initial prototypes one by one. The first medoid, or, is 
randomly picked from the data set. The rest of medoids are 
selected successively one by one in such a way that each one 
is most dissimilar to all the medoids that have already been 
picked. Suppose er = {<J\,<J 2 ,--- , <7j} is the set of the first 
chosen j (j < c) medoids. Then the j + 1 medoid, <x,+i, is 
set to the object x p with 


p = arg max 

l<i<n\Xi£cr 


min T(xi,a k ) 
go¬ 


od 


This selection process makes the initial prototypes evenly 
distributed and locate as far away from each other as possible. 
The popular measures. Precision (P), Recall (R) and Rand In¬ 
dex (RI), which are typically used to evaluate the performance 
of hard clusterings are also used here. Precision is the fraction 
of relevant instances (pairs in identical groups in the clustering 
benchmark) out of those retrieved instances (pairs in identical 
groups of the discovered clusters), while recall is the fraction 
of relevant instances that are retrieved. Then precision and 
recall can be calculated by 

a a 

P = - and R =-- (32) 

a + c a + a 


respectively, where a (respectively, b ) be the number of 
pairs of objects simultaneously assigned to identical classes 
(respectively, different classes) by the stand reference partition 
and the obtained one. Similarly, values c and d are the 
numbers of dissimilar pairs partitioned into the same cluster, 
and the number of similar object pairs clustered into different 
clusters respectively. The rand index measures the percentage 
of correct decisions and it can be defined as 


2 (a + b) 
n(n — 1) ’ 


(33) 


where n is the number of data objects. 

For fuzzy and evidential clusterings, objects may be par¬ 
titioned into multiple clusters with different degrees. In such 
cases precision would be consequently low JT2| . Usually the 


fuzzy and evidential clusters are made crisp before calculating 
the measures, using for instance the maximum membership 
criterion 03 and pignistic probabilities j4j. Thus in this work 
we will harden the fuzzy and credal clusters by maximizing 
the corresponding membership and pignistic probabilities and 
calculate precision, recall and RI for each case. 

The introduced imprecise clusters can avoid the risk to 
group a data into a specific class without strong belief. In other 
words, a data pair can be clustered into the same specific group 
only when we are quite confident and thus the misclassification 
rate will be reduced. However, partitioning too many data 
into imprecise clusters may cause that many objects are not 
identified for their precise groups. In order to show the 
effectiveness of the proposed method in these aspects, we 
use the indices for evaluating credal partitions. Evidential 
Precision (EP), Evidential Recall (ER) and Evidential Rank 
Index (ERI) (7) defined as: 


E p = ER=^k, 

N k ’ N r ’ 


ERI = 


2(q* + b*) 
n(n — 1) 


(34) 


In Eq. ( [34] ), the notation N e denotes the number of pairs par¬ 
titioned into the same specific group by evidential clusterings, 
and n er is the number of relevant instance pairs out of these 
specifically clustered pairs. The value N r denotes the number 
of pairs in the same group of the clustering benchmark, and 
ER is the fraction of specifically retrieved instances (grouped 
into an identical specific cluster) out of these relevant pairs. 
Value a* (respectively, b*) is the number of pairs of objects 
simultaneously clustered to the same specific class ( i.e ., single- 
ton class, respectively, different classes) by the stand reference 
partition and the obtained credal one. When the partition 
degrades to a crisp one, EP, ER and ERI equal to the classical 
precision, recall and rand index measures respectively. EP and 
ER reflect the accuracy of the credal partition from different 
points of view, but we could not evaluate the clusterings from 
one single term. For example, if all the objects are partitioned 
into imprecise clusters except two relevant data object grouped 
into a specific class, EP = 1 in this case. But we could not say 
this is a good partition since it does not provide us with any 
information of great value. In this case ER « 0. Thus ER could 
be used to express the efficiency of the method for providing 
valuable partitions. ERI is like the combination of EP and 
ER describing the accuracy of the clustering results. Note that 
for evidential clusterings, precision, recall and RI measures 
are calculated after the corresponding hard partitions are got, 
while EP, ER and ERI are based on hard credal partitions (41). 


A. Karate Club network 

Graph visualization is commonly used to visually model 
relations in many areas. For graphs such as social networks, the 
prototype of one group is likely to be one of the persons (i.e., 
nodes in the graph) playing the leader role in the community. 
Moreover, a graph (network) of vertices and edges usually 
describes the interactions between different agents of the 
complex system and the pair-wise relationships between nodes 
are often implied in the graph data sets. Thus medoids-based 









relational clustering algorithms could be directly applied. In 
this section we will evaluate the effectiveness of the proposed 
methods applied on community detection problems. Here we 
test on a widely used benchmark in detecting community 
structures, “Karate Club”, studied by Wayne Zachary. The 
network consists of 34 nodes and 78 edges representing the 
friendship among the members of the club (see Figure |T|a). 

There are many similarity and dissimilarity indices for net¬ 
works, using local or global information of graph structure. In 
this experiment, different similarity metrics will be compared 
first. The similarity indices considered here are listed in Table 
[I] It is notable that the similarities by these measures are from 
0 to 1 , thus they could be converted into dissimilarities simply 
by dissimilarity = 1 — similarity. The comparison results 
for different dissimilarity indices by FCMdd and ECMdd are 
shown in Table [II] and Table III respectively. As we can 
see, for all the dissimilarity indices, for ECMdd, the value 
of evidential precision is higher than that of precision. This 
can be attributed to the introduced imprecise classes which 
enable us not to make a hard decision for the nodes that 
we are uncertain and consequently guarantee the accuracy of 
the specific clustering results. From the table we can also see 
that the performance using the dissimilarity measure based on 
signal prorogation is better than those using local similarities 
in the application of both FCMdd and ECMdd. This reflects 
that global dissimilarity metric is better than the local ones for 
community detection. Thus in the following experiments, we 
only consider the signal dissimilarity index. 


TABLE I 

Different local and global similarity indices. 


Index 

Global metric 

Ref. Index 

Global metric 

Ref. 

Jaccard 

No 


13 

Zhou 

No 

114 1 

Pan 

No 


15 

Signal 

Yes 



two specific communities are node 5 and node 29, while by 
ECMdd node 5 and node 33. The uncertain nodes found by 
MECM are node 3 and node 9. 

From this experiment we can see that the introduced im¬ 
precise classes by credal partitions could help us make soft 
decisions for the uncertain objects which may lie in the 
overlapped area. This could avoid the risk of making errors 
simply by hard partitions. 






Fig. 1. The Karate Club network. The parameters of MECM are a = 
1.5 ,0 = 2,(5 = 100, r; = 0 . 9,7 = 0-05. In ECMdd, a = 0.05,0 = 
2, <5 = 100, r] = 1 ,7 = 1, while in FCMdd, 0 = 2. 


TABLE II 

Comparison of different similarity indices by FCMdd. 


B. Countries data 


Index 

P 

R 

RI 

EP 

ER 

ERI 

Jaccard 

0.6364 

0.7179 

0.6631 

0.6364 

0.7179 

0.6631 

Pan 

0.4866 

1.0000 

0.4866 

0.4866 

1.0000 

0.4866 

Zhou 

0.4866 

1.0000 

0.4866 

0.4866 

1.0000 

0.4866 

Signal 

0.8125 

0.8571 

0.8342 

0.8125 

0.8571 

0.8342 


TABLE III 

Comparison of different similarity indices by ECMdd. 


Index 

P 

R 

RI 

EP 

ER 

ERI 

Jaccard 

0.6458 

0.6813 

0.6631 

0.7277 

0.5092 

0.6684 

Pan 

0.6868 

0.7070 

0.7005 

0.7214 

0.6923 

0.7201 

Zhou 

0.6522 

0.6593 

0.6631 

0.7460 

0.3443 

0.6239 

Signal 

1.0000 

1.0000 

1.0000 

1.0000 

0.6190 

0.8146 


The detected community structures by different methods 
are displayed in Figure [T]b - [T]d. FCMdd could detect the 
exact community structure of all the nodes except nodes 3, 
14, 20. As we can see from the figures, these three nodes 
have connections with both communities. They are partitioned 
into imprecise class W 12 = {wi,^}, which describing the 
uncertainty on the exact class labels of the three nodes, by the 
application of ECMdd. The medoids found by FCMdd of the 


In this section we will test on a direct relational data 
set, referred as the benchmark data set Countries Data |I|], 
0 - The task is to group twelve countries into clusters 


based on the pairwise relationships as given in Table IV 


which is in fact the average dissimilarity scores on some 
dimensions of quality of life provided subjectively by stu¬ 
dents in a political science class. Generally, these coun¬ 
tries are classified into three categories: Western, Developing 
and Communist. We test the performances of FCMdd and 
ECMdd with two different sets of initial representative coun¬ 
tries which are Ai= {CIO: USSR; C 8 : Israel; Cl: India} and 
A 2 = {C 6 : France; C4: Cuba; Cl: Belgium}. The three coun¬ 
tries in Ai are well separated. On the contrary, for the 
countries in A 2 , Belgium is similar to France, which makes 
two initial medoids of three are very close in terms of the given 
dissimilarities. The parameters are set as /3 = 2 for FCMdd, 
and (i = 2, a = 0.95, t] = 1 ,7 = 1 for ECMdd. 

The results of FCMdd and ECMdd are given in Table [V] 
and Table [VT| respectively. It can be seen that FCMdd is very 
sensitive to initializations. When the initial prototypes are well 






























TABLE IV 

Countries data: dissimilarity matrix. 



Countries 

Cl 

C2 

C3 

C4 

C5 

C6 

Cl 

C8 

C9 

CIO 

Cll 

C12 

1 

Cl 

Belgium: 

0.00 

5.58 

7.00 

7.08 

4.83 

2.17 

6.42 

3.42 

2.50 

6.08 

5.25 

4.75 

2 

C2 

Brazil 

5.58 

0.00 

6.50 

7.00 

5.08 

5.75 

5.00 

5.50 

4.92 

6.67 

6.83 

3.00 

3 

C3 

China 

7.00 

6.50 

0.00 

3.83 

8.17 

6.67 

5.58 

6.42 

6.25 

4.25 

4.50 

6.08 

4 

C4 

Cuba 

7.08 

7.00 

3.83 

0.00 

5.83 

6.92 

6.00 

6.42 

7.33 

2.67 

3.75 

6.67 

5 

C5 

Egypt 

4.83 

5.08 

8.17 

5.83 

0.00 

4.92 

4.67 

5.00 

4.50 

6.00 

5.75 

5.00 

6 

C6 

France 

2.17 

5.75 

6.67 

6.92 

4.92 

0.00 

6.42 

3.92 

2.25 

6.17 

5.42 

5.58 

7 

Cl 

India 

6.42 

5.00 

5.58 

6.00 

4.67 

6.42 

0.00 

6.17 

6.33 

6.17 

6.08 

4.83 

8 

C8 

Israel 

3.42 

5.50 

6.42 

6.42 

5.00 

3.92 

6.17 

0.00 

2.75 

6.92 

5.83 

6.17 

9 

C9 

USA 

2.50 

4.92 

6.25 

7.33 

4.50 

2.25 

6.33 

2.75 

0.00 

6.17 

6.67 

5.67 

10 

CIO: USSR 

6.08 

6.67 

4.25 

2.67 

6.00 

6.17 

6.17 

6.92 

6.17 

0.00 

3.67 

6.50 

11 

Cll: Yugoslavia 

5.25 

6.83 

4.50 

3.75 

5.75 

5.42 

6.08 

5.83 

6.67 

3.67 

0.00 

6.92 

12 

Cl2: Zaire 

4.75 

3.00 

6.08 

6.67 

5.00 

5.58 

4.83 

6.17 

5.67 

6.50 

6.92 

0.00 


set (the case of Ai), the obtained partition is reasonable. 
However, the clustering results become worse when the initial 
medoids are not ideal (the case of A 2 ). In fact two of the 
three medoids are not changed during the update process of 
FCMdd when using initial prototype set A 2 . This example 
illustrates that FCMdd is quite easy to be stuck in a local 
minimum. For ECMdd, the credal partitions are the same with 
different initializations. The pignistic probabilities are also 
displayed in Table VI which could be regarded as membership 
values in fuzzy partitions. The country Egypt is clustered into 
imprecise class {1, 2}, which indicating that Egypt is not so 
well belongs to Developing or Western alone, but belongs to 
both categories. This result is consistent with the fact shown 
from the dissimilarity matrix: Egypt is similar to both USA 
and India, but has the largest dissimilarity to China. From this 
experiment we could conclude that ECMdd is more robust to 
the initializations than FCMdd. 

From Table [VI] we can also see the medoid of each class. 
For instance, China is the medoid of its cluster (Communist 
countries) no matter which initial prototype set is used. This 
reflects the important role of China in communist countries 
and it has significant communist characters. 


C. UCI data sets 

Finally the clustering performance of different methods 
will be compared on two benchmark UCI relational data 
sets: “Cat cortex” data set and “Protein” data set. The given 
information for these data sets is pair-wise relationship values. 
For the former it is a matrix of connection strengths between 
65 cortical areas of the cat brain, while for the latter is 
a dissimilarity matrix measuring the structural proximity of 
213 proteins sequences. The comparison results by different 
evaluation indices are displayed in Figure [2] For ECMdd 
and MECM, the classical Precision (P), Recall (R) and Rand 
Index (RI) are calculated based on the pignistic probabilities, 
and the corresponding evidential indices are obtained from 
the hard credal partition 0. As it can be seen, the three 
classical measures are almost the same for all the methods. 
This reflects that pignistic probabilities play a similar role 
as fuzzy membership. But we can see that for ECMdd and 
MECM, EP is significantly high. Such effect can be attributed 
to the introduced imprecise clusters which enable us to make a 


compromise decision between hard ones. But as many points 
are clustered into imprecise classes, the evidential recall value 
is low. The performance of ECMdd is slightly better than 
MECM. But we know the expression of imprecise classes 
of ECMdd is more simple than that of MECM and from the 
experiment it proves that ECMdd is more efficient than MECM 
in terms of executing time. 




c. RI 


Fig. 2. The clustering results for two UCI data sets. 

V. Conclusion 

In this paper, the evidential c-medoids clustering is proposed 
as a new medoid-based clustering algorithm. The proposed 
approach is the extensions of crisp c-medoids and fuzzy c- 
medoids on the framework of belief function theory. By the 
introduced imprecise clusters, we could find some overlapped 
and indistinguishable clusters for uncertain patterns. This 
results in higher accuracy of the specific decisions. The ex¬ 
perimental results illustrates the advantages of credal partitions 





















































TABLE V 

Clustering results of FCMdd for countries data. The prototype (medoid) of each class is marked with *. 


FCMdd with Ai FCMdd with A 2 



Countries 

Uil 

Ui2 

Ui3 

Label 

Medoids 

Uil 

Ui2 

Ui 3 

Label 

Medoids 

1 

Cl: Belgium 

0.4773 

0.2543 

0.2685 

1 

- 

1.0000 

0.0000 

0.0000 

1 

* 

2 

C6: France 

0.4453 

0.2719 

0.2829 

1 

- 

0.0000 

1.0000 

0.0000 

2 

* 

3 

C8: Israel 

1.0000 

0.0000 

0.0000 

1 

* 

0.4158 

0.3627 

0.2215 

1 

- 

4 

C9: USA 

0.5319 

0.2311 

0.2371 

1 

- 

0.4078 

0.4531 

0.1391 

2 

- 

5 

C3: China 

0.2731 

0.3143 

0.4126 

3 

_ 

0.2579 

0.2707 

0.4714 

3 

_ 

6 

C4: Cuba 

0.2235 

0.2391 

0.5374 

3 

- 

0.0000 

0.0000 

1.0000 

3 

* 

7 

CIO: USSR 

0.0000 

0.0000 

1.0000 

3 

* 

0.2346 

0.2312 

0.5342 

3 

- 

8 

Cll: Yugoslavia 

0.2819 

0.2703 

0.4478 

3 

- 

0.2969 

0.2875 

0.4156 

3 

- 

9 

C2: Brazil 

0.3419 

0.3761 

0.2820 

2 

_ 

0.3613 

0.3506 

0.2880 

1 

_ 

10 

C5: Egypt 

0.3444 

0.3687 

0.2870 

2 

- 

0.3558 

0.3493 

0.2948 

1 

- 

11 

Cl: India 

0.0000 

1.0000 

0.0000 

2 

* 

0.3257 

0.3257 

0.3485 

3 

- 

12 

Cl2: Zaire 

0.3099 

0.3959 

0.2942 

2 

- 

0.3901 

0.3321 

0.2778 

1 

- 


TABLE VI 

Clustering results of ECMdd for countries data. The prototype (medoid) of each class is marked with *. The Label {1,2} 

REPRESENTS THE IMPRECISE CLASS EXPRESSING THE UNCERTAINTY ON CLASS 1 AND CLASS 2. 


ECMdd with Ai ECMdd with A 2 



Countries 

BetPn 

BetP i2 

BetP i3 

Label 

Medoids 

BetPn 

BetP i2 

BetP iS 

Label 

Medoids 

1 

Cl: Belgium 

1.0000 

0.0000 

0.0000 

1 

* 

1.0000 

0.0000 

0.0000 

1 

* 

2 

C6: France 

0.4932 

0.2633 

0.2435 

1 

- 

0.5149 

0.2555 

0.2297 

1 

- 

3 

C8: Israel 

0.4144 

0.3119 

0.2738 

1 

- 

0.4231 

0.3051 

0.2719 

1 

- 

4 

C9: USA 

0.4503 

0.2994 

0.2503 

1 

- 

0.4684 

0.2920 

0.2396 

1 

- 

5 

C3: China 

0.2323 

0.2294 

0.5383 

3 

* 

0.0000 

0.0000 

1.0000 

3 

* 

6 

C4: Cuba 

0.2778 

0.2636 

0.4586 

3 

- 

0.2899 

0.2794 

0.4307 

3 

- 

7 

CIO: USSR 

0.2509 

0.2260 

0.5231 

3 

- 

0.3167 

0.2849 

0.3984 

3 

- 

8 

Cll: Yugoslavia 

0.3478 

0.2488 

0.4034 

3 

- 

0.3579 

0.2526 

0.3895 

3 

- 

9 

C2: Brazil 

0.0000 

1.0000 

0.0000 

2 

* 

0.0000 

1.0000 

0.0000 

2 

* 

10 

C5: Egypt 

0.3755 

0.3686 

0.2558 

{1,2} 

- 

0.3845 

0.3777 

0.2378 

{1,2} 

- 

11 

Cl: India 

0.3125 

0.3650 

0.3226 

2 

- 

0.2787 

0.3740 

0.3473 

2 

- 

12 

Cl2: Zaire 

0.3081 

0.4336 

0.2583 

2 

- 

0.3068 

0.4312 

0.2619 

2 

- 


by ECMdd. In real applications, using only one medoid may 
not adequately model different types of group structure and 
hence limits the clustering performance on complex data 
sets. Therefore, we intend to include the feature of multiple 
prototype representation of classes in our future research work. 
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